Getting Idioms into a Lexicon Based Parser's Head
نویسنده
چکیده
An account is given of flexible idiom processing within a lexicon based parser. The view is a compositional one. The parser's behaviour is basically the "literal" one, unless a certain threshold is crossed by the weight of a particular idiom. A new process will then be added. The parser, besides yielding all idiomatic and literal interpretations embodies some claims of human processing simulation. 1. M o t i v a t i o n a n d c o m p a r i s o n w i t h o t h e r a p p r o a c h e s Id ioms a re a p e r v a s i v e p h e n o m e n o n in n a t u r a l languages. For instance, the first page of this paper (even if writ ten by a non-native speaker) includes no less than halfdozen of them. Linguists have proposed different accounts for idioms, which are derived from two basic points of view: one point of view considers idioms as the basic units of language, wi th holistic characteristics, perhaps including wordsasa particular case; the other point of view emphasizes instead the fact that idioms are made up of normal parts of speech, that play a precise role in the complete idiom. An expl ici t s t a t e m e n t wi th in th is approach is the Principle of Decompositionality (Wasow, Sag and Nunberg 1982): "When an expression admits analysis as morphologically or syntactically complex, assume as an operating hypothesis that the sense of the expression arises from the composition of the senses of its constituent parts". The syntactic consequence is that idioms are not a different thing from "normal" forms. Our view is of the lat ter kind. We are aware of the fact tha t the f lexibi l i ty of an idiom, depends on how recognizable its metaphorical origin is. Within flexible word order languages the flexibility of idioms seems to be even more closely linked to the strengths of particular syntactic constructions. Let us now briefly discuss some computational approaches to idiom understanding. Applied computational systems must necessarily have a capacity for analyzing idioms. In some systems there is a preprocessor delegated to the recognition of idiomatic forms. This preprocessor replaces the group of words that make for one idiom with the word or words that convey the meaning involved. In ATN systems instead, specially if oriented towards a particular domain, sometimes there are sequences of particular arcs inserted in the network, which, if transited, lead to the recognition of a particular idiom (e.g. PLANES, Waltz 1978). LIFER (Hendrix 1977), one of the most successful applied systems, was based on a semantic grammar, and within this mechanism idiom recognition was easy to implement, without considering flexibility. Of course, in all these systems there is no intention to give an account of human processing. PHRAN (Wilensky and Arens 1980) is a system based entirely on pattern recognition. Idiom recognition, following Fillmore's view (Fillmore 1979) is considered the basic resource all the way down to replace the concept of grammar based parsing. PHRAN is based on a data base of patterns (including single words, at the same level), and proceeds deterministically, applying the two principles "when in doubt choose the more specific pattern'* and "choose the longest pattern'. The limits of this approach lie in the capacity of generating various alternative interpretations in case of ambiguity and in running the risk of having an eccessive spread of nonterminal symbols if the data base of idioms is large. A recent work on idioms with a similar perspective is Dyer and Zernik (1986). The approach we have followed is different. The goals we had with our work must be stated explicitly: I) to yield a cognitive model of idiom processing; 2) to integrate
منابع مشابه
The Lexicon-Grammar of Italian Idioms
This paper presents the Lexicon-Grammar classification of Italian idioms that has been constructed on formal principles and, as such, can be exploited in information extraction. Among MWEs, idioms are those fixed constructions which are hard to automatically detect, given their syntactic flexibility and lexical variation. The syntactic properties of idioms have been formally represented and cod...
متن کاملCOST Action IC1207 PARSEME meeting
Dealing with idioms in Natural Language Processing systems is difficult, among other reasons, because their architecture must be conceived in such a way that it should not preclude the processing of both free word combinations and these, more constraint, expressions. On the other hand, many idioms do have syntactic structure, and can undergo several types of formal variation, thus making them h...
متن کاملPROGRESS REPORT: Active Knowledge Structures in Natural Language Understanding
In the case of the other semantics-based parser PM, Jerry Ball took two of the texts from the Navy message database and added the vocabulary from those messages to the parser's lexicon. After a small amount of modification, the parser was able to parse about 80% of the sentences in those two messages into reasonable representations. With some additional work this percentage can be improved. Giv...
متن کاملThe Generation of Idiomatic and Collocational Expressions
Collocations whose semantic content is not or only partially composed from the semantic content of their parts are often viewed as problematic for generation. In this paper a tactical generator combining FUF as the generation engine and HPSG as the grammar framework is presented. It is shown, that the lexicon driven approach to syntactic and semantic processing is well-suited for the generation...
متن کاملInterlanguage Signs And Lexical Transfer Errors
A theory of interlanguage (IL) lexicons is outlined, with emphasis on IL lexical entries, based on the HPSG notion of lexical sign. This theory accounts for idiosyncratic or lexical transfer of syntactic subcategorisation and idioms from the first language to the IL. It also accounts for developmental stages in IL lexical grammar, and grammatical variation in the use of the same lexical item. T...
متن کامل